Frequent pattern mining under generalized subsumption

نویسندگان

  • Jan Ramon
  • Jan Struyf
  • Luc De Raedt
چکیده

Frequent pattern mining (including the discovery of association rules) is an important task in data mining. Recently, there is increasing interest in mining relational databases. Up to now, most algorithms have focussed on a syntactical approach. However, the use of background knowledge would greatly improves the quality of the results. First, patterns and rules which are not equivalent from a syntactical point of view, may be semantically equivalent. Taking into account the semantical relationships between patterns improves the comprehensibility while decreasing the size of the discovered set of patterns. Second, while the use of background knowledge increases the expressivity and therefore comes with a cost, it also allows to better exploit the benefits of some optimizations. While some special cases (such as taxonomies) have been investigated before in the propositional case, our work is the first to investigate this in the most general case, using generalized subsumption to determine semantical relationships. Our work focusses on three main topics. First, there is the problem of the representation of the information and generalizing definitions. We study the trie representation of set of relational pattern, the condensed representation of sets of association rules, extending (De Raedt & Ramon, 2004), the types of background knowledge that are compatible with certain optimizations such as the computation of canonical forms of patterns etc. Second, we investigate the efficiency issues in relation to this more general setting of frequent pattern mining. Here, relevant topics are the transformation of logical queries (cut, once, reorder etc.), the pack and ad-pack mechanisms to avoid re-evaluation of a common prefix of sets of queries, efficient ways to perform the monotonicity test and equivalence test, etc. Third, we are working on an efficient implementation and experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequent and Similar Patterns with Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP) Data Mining Technique

Attribute Oriented Induction High level Emerging Pattern (AOI-HEP) is a novel idea which is influenced by Attribute Oriented Induction (AOI) and Emerging Pattern (EP). AOI-HEP discovers patterns such as Total Subsumption HEP (TSHEP), Subsumption Overlapping HEP (SOHEP) and Total Overlapping HEP (TOHEP), include frequent and similar patterns. Mining TSHEP, SOHEP, TOHEP, frequent and similar patt...

متن کامل

Efficient homomorphism-free enumeration of conjunctive queries

Many algorithms in the field of inductive logic programming rely on a refinement operator satisfying certain desirable properties. Unfortunately, for the space of conjunctive queries under θ-subsumption, no optimal refinement operator exists. In this paper, we argue that this does not imply that frequent pattern mining in this setting can not be efficient. As an example, we consider the problem...

متن کامل

Mining Tree Patterns with Partially Injective Homomorphisms

One of the main differences between ILP and graph mining is that while pattern matching in ILP is mainly defined by homomorphism (subsumption), it is the subgraph isomorphism in graph mining. Using that subgraph isomorphisms are injective homomorphisms, we bridge the gap between the two pattern matching operators with partially injective homomorphisms, which are homomorphisms requiring the inje...

متن کامل

DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets

Frequent closed itemsets (FCI) play an important role in pruning redundant rules fast. Therefore, a lot of algorithms for mining FCI have been developed. Algorithms based on vertical data formats have some advantages in that they require scan databases once and compute the support of itemsets fast. Recent years, BitTable (Dong & Han, 2007) and IndexBitTable (Song, Yang, & Xu, 2008) approaches h...

متن کامل

Discovering Active and Profitable Patterns with Rfm (recency, Frequency and Monetary) Sequential Pattern Mining–a Constraint Based Approach

Sequential pattern mining is an extension of association rule mining that discovers time-related behaviors in sequence database. It extends association by adding time to the transactions. The problem of finding association rules concern with intratransaction patterns whereas that of sequential pattern mining concerns with inter-transaction patterns. Generalized Sequential Pattern (GSP) mining a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004